Abstract
Researchers in the social sciences are increasingly drawing upon citizen science approaches for the collection of data. These approaches seek to involve non-professional scientists in the research process.
However, there are concerns surrounding the quality of data produced by citizen scientists, which hinder further adoption of citizen science approaches. Protocols have been developed to detect and remove low quality contributions and ensure that data produced by citizen scientists are of sufficient quality; however, these protocols are difficult to apply for approaches such as ecological momentary assessment, which focus on generating data on subjective phenomena. Researchers using these approaches have tended to use engagement, typically operationalised as the quantity of contributions made by a particpant, as a proxy measurement for the quality of contributions.
However, it remains unknown the extent to which engagament is a good proxy for data quality. If it is not, this approach can potentially create issues for the reproducibility of results by undermining the statistical power of the studies, and by creating a large amount of arbitrary researcher degrees of freedom.
Using data from the Britain Breathing project and antihystamine prescription data from the NHS, we explore the relationshiop between data quality and engagement, as operationalised in various approaches identified in the litterature. Furthermore, we explore how removing low-engament users impacts the respresentativness of the data, and how this varies depending on how engagement is operationalised.
Two or three sentences explaining what the main result reveals in direct comparison to what was thought to be the case previously, or how the main result adds to previous knowledge. One or two sentences to put the results into a more general context. Two or three sentences to provide a broader perspective, readily comprehensible to a scientist in any discipline.
Citizen science is an increasingly popular approach to research across disciplines. Whilst citizen science is best developed and most popular in natural science disciplines such as ecology and astronomy, scholars have noted a rise in other fields such as geography, health, criminology, and bio-medical research (Solymosi, et al. 2021; Estellés-Arolas, 2020; Trojan, et al. 2019; Wiggins & Wilbanks, 2019).
Citizen science approaches provide a number of benefits over more conventional approaches. They are able to engage non-professionals in the research process, providing personal benefits to participants and leading to long term behavioural changes (Agnello, et al. 2022, Church, et al. 2019). Furthermore they are able to effectively mobilize local knowledge (Elliott and Rosenberg, 2014; Wynn, 1989:34), and the relative cost-effectiveness of data produced by citizen scientists as copared to traditional sources of data has proven particularly attractive for studying phenomena with large spatial and temporal scales (Aceves-Bueno, et al. 2017:3).
However, concerns surrounding the quality of data produced by citizen scientists have hindered further adoption of citizen science approaches (Aceves-Bueno, et al. 2017; Basiri, et al. 2019; Elliott & Rosenberg, 2019; Lukyanenko, et al. 2016; Riesch & Potter, 2014).
Whilst it has been shown that in some cases the quality of data provided by citizen scientists can rival that of professional scientists, there is more heterogeneity in the quality of data. A 2017 review found that the agreement between professionally-collected data and the data provided by citizen scientists is greatest in situations when there are a large number of citizen scientists, where training is provided, and where the research directly relates to participants’ economic and health situations (Aceves-Bueno, et al. 2017).
Even where data provided by citizen scientists shows large levels of agreement, the perception that citizen science data may not be reliable remains a hindrance (Riesch & Potter, 2014).
Identifying and removing low quality and careless responses therefore remains essential for the both the credibility of the data, but also for valid inferences to be made (Huang, 2018; Johnson & Sieber, 2013; Ternovski, 2022; McGonagle, et al. 2016). Huang (2018) demonstrates that insufficient effort responses can have a confounding effect on variables of interest, and can inflate observed correlations. Ternovski argues further that inattentive respondents can introduce “substantial measurement error and attenuation bias” (2022: 1). In a power analysis they found that almost four times more participants were needed to achieve 80% power for their statistical tests, compared to when no screening for attention was implemented.
In response to concerns surrounding data quality, approaches have been developed to mitigate concerns around data quality in citizen science studies, such as the implementation of quality assurance standards and protocols (Minghini, et al. 2017; Fonte, et al. 2017; Samulowska, et al, 2021).
This includes approaches that leverage large amounts of redundancy in the data collection or classification approach (Balázs, et al. 2021; Lintott et al. 2008), or approaches where strong priors about the distribution of likely observations enable the flagging of unlikely reports for further investigation (Salganik, 2019; Kelling et al. 2012).
Redundancy can be leveraged in projects where an object may be mapped out and revised by many different users such as Open Street Map (Balázs, et al. 2021), or projects such as Galaxy Zoo where a same image can be shown to many participants (Lintott et al. 2008), and the high number of concurrent interpretations of the image/object can be used as a indicator of quality of the classification.
Another approach is to use prior knowledge about the phenomenon in question to flag unlikely reports, either for re-evaluation by an expert or a user, or for more information such as a picture. For example, in the eBird project reports of “very rare species, very high counts, or out-of-season reports” are automatically flagged and a request for additional information, including a photograph, is sent to the participant (Salganik, 2019). This information is then sent to a regional expert for validation who then decides whether the report is legitimate (Kelling et al. 2012).
However, these approaches are not universally applicable and are often developed in contexts where there is a relatively large amount of information about both participants, their contribution history, and the object or phenomena about which they are collecting data.
For some approaches, such as studies using experience sampling methodology (ESM) which typically focus on generating data about the subjective experiences of participants, these approaches are difficultly applicable, in particular when these studies are opt-in, and joining the study is as simple as downloading a smartphone application.
The validation approaches described above tend to assume that there is a “ground truth” which can be measured and used to validate the data provided by participants, for example using professionally gathered data, or large amounts of citizen science data with repeated observations of the phenomena of interest. However, when the data is about the subjective experiences and perceptions of participants, the aim is not to discover the “truth” about some external phenomenon, but to understand how emotions and symptoms are experienced subjectively in participants’ daily lives.
However, for the reasons described above, there remains the need to determine whether reports about participants’ subjective experiences were completed thoughtfully or carelessly (Jaso, et al. 2021).
Given the difficulties in applying standard data quality protocols to ESM data, the most common approach used in these situations has been to use engagement, typically operationalised as the quantity of contributions submitted by a participant, as a proxy for the quality of the data, on the assumption that quality and quantity are associated, since both the attentiveness with which reports were completed and the number of reports completed are assumed to be at least partially determined by a participant’s underlying motivation and level of engagement with the study (Doherty, et al. 2020; Geerharts, 2021).
Jaso, et al (2021) note that, for ESM studies, the “closest”best-practice” standard for cleaning data is the tendency to remove participants who do not meet an a-priori compliance cut off defined by the percent of surveys completed ” (3).
However, excluding participants has several potential downsides, including “rejecting legitimate responses, reducing power, and reducing sample’s representativness” (Ternovski, 2022). Furthermore the arbitrariness and large amount of researcher degrees of freedom for exclusion criteria can lead to multiple comparison problems, which can undermine the reproducibility of results (Steegen, et al. 2016; Gelman & Loken, 2013; Simonsohn, Simmons, & Nelson, 2020; Ioannidis, 2008). This is especially the case for exclusion criteria that are chosen post-hoc (Wicherts, et al. 2016: 5).
Furthermore, it remains to be established whether low engagement users do systematically produce lower-quality data, and our understanding of the motivations, levels of carelessness, and quality of the data produced by low-engagement participants remains limited (Eveleigh, et al. 2014., Jaso, et al. 2021., Welling, et al. 2021).
Start with the research question. This paper seeks to understand the consequences of excluding data from low-engagement users on data quality and study conclusions. To do this, we answer the following questions: who are low-engagement users? how does including/excluding their contribution affect study conclusions? etc etc etc.
This paper seeks to understand the impact of excluding data from low-engagement users on data quality, data representitativness, and study conclusions. To do so, we seek to answer the following research questions:
Do lower engagement users provide lower quality data than high engagement users?
Does the way engagement is operationalised moderate the relationship between data quality and engagement?
How does including/excluding low-engagement users affect the representativness of the sample?
Do certain ways of operationalising engagement provide systematically better trade-offs in terms of data quality, sample size, and representativity?
To answer these questions, this paper uses data from Britain Breathing, a smartphone application-based ESM study on the symptoms of allergic rhinitis (commonly referred to as hay fever), and anti-histamine prescription data from OpenPrescribing.net.
Finally this paper explores whether demographic differences between high and low engagement users, the key factor in determining how much rejecting low-engagement users will affect the sample’s representativeness, are sensitive to how engagement is operationalised.
This paper uses data on allergic rhinitis symptoms from the Britain Breathing application and data on antihistamine prescriptions from OpenPrescribing.net .
Britain breathing data:
The Britain Breathing project is a citizen science study which used an smartphone-based experience sampling approach to collecting geolocated time series data on seasonal pollen allergy symptoms, also referred to as allergic rhinitis or hay-fever (Vigo, et al. 2018).
To enroll in the study, participants downloaded and installed the Britain Breathing application to their smartphones from the Apple App Store for iOS users or the Google Play Store for Android users.
Upon installation participants were invited to provide some basic information such as their gender, age, allergy history, and whether or not they are taking antihistamine medication. Users of the application are then able to report symptoms at any time; furthermore, they have the option of reporting at daily scheduled intervals.
When participants make a report, they are first asked how they are feeling. If they respond that they are feeling well, no further questions were asked, if they respond otherwise, they are asked further questions about the severity of various symptoms (“nose”, “eyes, and”breathing”), which they report on a four point scale ranging from 0, which signifies an absence of symptoms, to 3, which indicates severe symptoms. However, a bug in the application meant that some users (XXXX there may be another bug, waiting to hear back on thatXXXX)
The date, time, and location of each submission is automatically recorded by the application.
The data has good geographical coverage, with reports submitted from 119 out of 124 (~96%) postcode areas in the UK, with the average number of reports per postcode being 305.
Three versions of the application were deployed, v2016, v1 and v2, the use of which is shown in Figure x.
NHS England prescription data:
As a means of validating the data collected using the Britain Breathing application, Vigo, et al (2018) look at the correlation between the median lack of wellness reported on the Britain Breathing application and the number of antihistamines prescribed by general practitioners, finding that they are very strongly (r = 0.93) correlated. This data is openly available from OpenPrescribing.net.
The way in which engagement is understood and operationalised varies a lot between scholars and fields (Perski, et al. 2017, Yardley, et al. 2016). This paper seeks to establish how sensitive the impact of removing disengaged users is to the way engagement is operationalised. To do this, we classify participants according to different clustering methods identified in the ESM literature, and compare conclusions between them.
Threshold methods
Jaso, et al (2021) describe a general trend in the EMA/ESM literature to only include participants with a 70%–90% compliance rate, or according to some threshold, highlighting that this is often selected by convention and not empirically determined (2021:3). For example Sun, et al (2020) consider participants who completed less than 10 assessments to be unengaged, and excluded them for their analysis.
A more data driven approach is found in a study by Kronkvist and Engstrom (2020), who split their participants into abstainers, who completed zero assessments, dedicated participants who completed a number of assessments one standard deviation or more above the average number of assessments completed by participants, and occasional participants who did not meet the criteria for the two previous groups.
In this paper I will cluster participants according to various threshold methods. Since participants in the Britain Breathing are not prompted to make reports according to a predetermined schedule, compliance rates cannot be used. I will therefore focus on thresholds which use the number of reports submitted.
I will use thresholds of 1, 3, 5, and 10 contributions, as well as thresholds based on the average number of assessments, following the approach used in Kronkvist and Engstrom (2020) (with the exception that there will be no category which matches the abstainers category since participants who complete zero assessments are not included in the data).
Hidden Markov models
Druce, et al (2017) argue threshold approaches overlook the complexity of patterns of engagement. In particular they do not include information on the continuity of data entry, focusing instead on the total amount of submissions.
For example a participant who reports every day for two weeks may plausibly be more engaged than a participant who reports on average twice a week for four months, even though the latter will have made more contributions. Threshold approaches risk excluding participants who are highly engaged, but for short periods of time.
As an alternative, the authors propose using a three step process using first-order hidden Markov models.
Hidden Markov models assume that the observations are being generated by a first-order Markov process with unobservable hidden states. First-order Markov processes are systems in which the probability of observing a given state at any given event is determined entirely by the state at the previous event. For a hidden Markov model the state which explains the observations is not observable, and must be inferred from the sequence of observations.
In this case, the observations were whether or not a participant had made a contribution on a given day. One limitation of this approach is that some participants very occasionally made more than one report in a single day, potentially indicating higher engagement, which was not accounted for in the model.
The latent “hidden” states in this case is how engaged the participant was. The model assumed that whether or not a participant would contribute on a given day was explained by them being in one of three latent states, high engagement (with a high probability of contributing) , low engagement (with a low probability of contributing), and disengaged (with a probability of contributing set at 1e-10, as the software used does not allow for values of zero).
The disengaged state is what is known as an absorbing state, meaning that once a participant was in the disengaged state, their probability of transitioning to another state was zero.
A further assumption was that all participants were in the highly engaged state on the day they began the study.
The process to cluster participants was as follows:
Create a dummy variable for whether or not a participant made a contribution on a given day.
Fit hidden Markov models according to the parameters described above using the depmixS4 package for R (Visser, 2021) to estimate the latent level of engagement for each participant on a given day.
Assigning participants to clusters according to the history of latent states inferred by the hidden Markov models using a Markov mixture model.
To validate the symptoms data provided by the Britain Breathing application, Vigo, et al (2018) use NHS antihistamine prescription data. They find that the number of antihistamines prescribed are very strongly associated (Pearson’s r = 0.93) with the average lack of wellness.
To test the impact of low excluding low-engagement users, the correlation between symptoms and the number of prescriptions was measured for each year and engagement cluster.
To test whether the correlation observed between the prescription data and the reported symptom intensity of low-engagement users was significantly different from the correlation observed between the prescription data and the reported symptom intensity of high-engagement users, the Fisher z-transformation (Fisher, 1921) was applied to each correlation coefficient.
The difference between two z-transformed coefficients approximately follows a normal distribution with known variance, allowing p-values for the two-tailed hypothesis test that the difference between observed correlations is not zero to be calculated (Li, 2019).
However, since the prescription data is only available months, this means that the n for the correlations was only 12 for each year, meaning that statistical significance was hard to ascertain for reasons of statistical power.
To address this, a resampling approach was used. Specifically a permutation test of independence was conducted using the correlation for all years and engagement clusters to test the null hypothesis that there was no difference between low and high engagement groups. This test was weighted based on the number of participants in each year.
To asses the impact of removing unengaged participants on the representativeness of the data, this paper compares the characteristics of respondents from engaged and unengaged groups.
Demographics
Previous studies have explored associations between engagement and demographic characteristics.
Perski, et al (2017) report age, gender, education, employment and ethnicity as the most commonly found associations with engagement and attrition. Similarly, Druce, et al (2017) find that age differs notably between groups (with engaged participants being over 5 years older on average), and a substantially lower proportion of women were present in the “tourist” cluster. Kronkvist and Engstrom (2020) find that age and being female is associated with higher engagement in the study, as do Rintala, et al (2019). Turner, et al (2017) find that race/ethnicity, education, and age are are associated with higher levels of engagement.
Britain breathing asks participants for their age and gender which will allow these characteristics to be compared.
Furthermore the geo-tagged nature of the reports allows us to determine whether the reports are made in an urban or rural setting using land use data from the UK’s Office for National Statistics (ONS)’s Census Rural-Urban Classification (Gledson, et al. Forthcoming?).
Symptom intensity
One possible factor in whether or not a participant chooses to contribute to the application on a given day may be the severity of the symptoms they are experiencing. Fig x shows that reports peak every year around pollen season. One possible mechanism is that as symptoms subside, participants stop reporting.
For comparisons between HMM clusters, Gaes Howell tests were used as
they allow for the comparison of more than two groups, and do not assume
homogeneity of variance or equal sample sizes.
Fig x
Britain Breathing data is available from April 2016 to April 2021. There were 200 982 contributions by 4 748 participants in 2016, 74 183 contributions by 335 participants in 2017, 65 534 contributions by 269 participants in 2018, 19 115 contributions by 78 participants in 2019, 16 446 contributions by 52 participants in 2020, and 182 contributions by 9 users in 2021.
Data was submitted throughout the year, with the exception of 2017, where data collection began in May, and in 2021, where data is available from January until the beginning of March
Data was analysed for each year, with the exception of 2021, due to the limited amount and temporal coverage of data for that year (fig x).
A common feature of citizen science data is high levels of participation inequality (Hackley, 2016). This feature can also be seen in the Britain Britain data. A Gini score was calculated for the number of contributions per participant, finding a value of ~0.79 indicating a high level of inequality.
Fig X shows how many participants have reported on a given number of days. The log-log scale makes clear that the overwhelmingly most common behavior was for participants to make a single contribution.
Simple clusterings
The most common way of operationalising engagement is using contribution thresholds (Jaso, et al. 2021). According to this approach, a participant is considered to not have engaged with the study if they contribute less than a certain number of reports. Figure X shows how participants in the first year would be classified according to this approach using 4 thresholds: 1, 3, 5, and 10.
Kronkvist and Engstrom (2020) split their participants into abstainers, who completed zero assessments, dedicated participants who completed a number of assessments one standard deviation or more above the average number of assessments completed by participants, and occasional participants who did not meet the criteria for the two previous groups. The mean number of contributions by participants in the first year of the study was ~4 and the standard deviation ~ 13. Figure X shows how participants would be classified according to this approach.
Table x shows how reports are classified in each year according to threshold approaches:
tribble(
~Year, ~Threshold, ~`Number of low engagement users` ,~`Number of high engaged users`,
1,"1", 3261, 16837,
1,"3", 5034, 15064,
1,"5", 5938, 14160,
1,"10", 7299, 12799,
1, "Kronkvist & Engstrom (17)", 8822, 11276,
2,"1", 73, 7345,
2,"3", 199, 7219,
2,"5", 332, 7086,
2,"10", 592, 6826,
2, "Kronkvist & Engstrom (59)", 3655, 3762,
3,"1",99, 6454,
3,"3",197, 6356,
3,"5",275, 6278,
3,"10",458, 6095,
3, "Kronkvist & Engstrom (77)",2055, 4498,
4,"1",22, 1889,
4,"3",68, 1843,
4,"5" ,104, 1807,
4,"10",142, 1769,
4, "Kronkvist & Engstrom (79)",677, 1234,
5,"1",16, 1628,
5,"3",44, 1600,
5,"5",62, 1582,
5,"10",92, 1552,
5, "Kronkvist & Engstrom (93)",566,1078
)
Hidden Markov Models
Druce, et al (2017) suggest using hidden Markov models as a better way of operationalising engagement. This paper applies the approach used in their paper to the Britain Breathing data.
Year one
Year two
Year three
Year four
Year five
## cluster
## low high medium tourist
## 139 676 797 32
Fig x shows that reports peak every year around pollen season. One possible mechanism is that as symptoms subside, participants stop reporting.
Threshold approaches
Welch’s t-test was used to test for differences in the reported symptoms between high and low engagement clusters. The effect size was measured using Cohen’s d (Delacre, et al. 2021). Results were reported according to the APA guidelines (XXX
Year one
Participants in year one were asked several questions about symptoms XXXX I think I can only use the how_im_doing variable since I don’t know which of the 0’s in the other variables are NA’s or just 0’s XXXX
With a threshold of 1 the difference is positive, statistically not significant, and very small (difference = 0.02, 95% CI [0.01, 0.04], t(4564.26) = 1.44, p = 0.150; Cohen’s d = 0.04, 95% CI [-0.02, 0.10])
With a threshold of 3 the difference is negative, statistically significant, and very small (difference = -0.02, 95% CI [-0.05, -4.42e-03], t(8349.99) = -2.38, p = 0.017; Cohen’s d = -0.05, 95% CI [-0.10, -9.26e-03]).
With a threshold of 5 the difference is negative, statistically significant, and very small (difference = -0.04, 95% CI [-0.06, -0.02], t(10635.78) = -4.03, p < .001; Cohen’s d = -0.08, 95% CI [-0.12, -0.04]).
With a threshold of 10 the difference is negative, statistically significant, and very small (difference = -0.06, 95% CI [-0.08, -0.04], t(14596.24) = -6.12, p < .001; Cohen’s d = -0.10, 95% CI [-0.13, -0.07]).
Using the data-driven threshold suggested by Kronkvist & Engström (2020) the difference is negative, statistically significant, and very small (difference = -0.06, 95% CI [-0.08, -0.05], t(18529.43) = -6.99, p < .001; Cohen’s d = -0.10, 95% CI [-0.13, -0.07]).
With a threshold of one, there was no significant differences between high and low engagement clusters, with thresholds of 3, 5, and 10, and there was small, statistically significant differences, with low engagement groups reporting sightly worse symptoms.
Year two
Participants in year two were not asked about their general well being. They were asked about the intensity of their symptoms regarding their nose, eyes, and breathing.
Nose
With a threshold of 1 the difference is negative, statistically significant, and large (difference = -0.50, 95% CI [-0.73, -0.28], t(73.10) = -4.43, p < .001; Cohen’s d = -1.04, 95% CI [-1.52, -0.54])
With a threshold of 3 the difference is negative, statistically significant, and large (difference = -0.47, 95% CI [-0.61, -0.33], t(205.67) = -6.59, p < .001; Cohen’s d = -0.92, 95% CI [-1.21, -0.63])
With a threshold of 5 the difference is negative, statistically significant, and large (difference = -0.45, 95% CI [-0.56, -0.34], t(353.26) = -8.21, p < .001; Cohen’s d = -0.87, 95% CI [-1.09, -0.65])
With a threshold of 10 the difference is negative, statistically significant, and large (difference = -0.43, 95% CI [-0.51, -0.35], t(668.53) = -10.55, p < .001; Cohen’s d = -0.82, 95% CI [-0.97, -0.66])
Eyes
With a threshold of 1 the difference is negative, statistically not significant, and very small (difference = -0.09, 95% CI [-0.29, 0.11], t(73.32) = -0.85, p = 0.398; Cohen’s d = -0.20, 95% CI [-0.66, 0.26]).
With a threshold of 3 the difference is negative, statistically not significant, and small (difference = -0.10, 95% CI [-0.22, 0.02], t(208.18) = -1.68, p = 0.093; Cohen’s d = -0.23, 95% CI [-0.51, 0.04]).
With a threshold of 5 the difference is negative, statistically significant, and small (difference = -0.10, 95% CI [-0.20, -8.41e-03], t(359.49) = -2.14, p = 0.033; Cohen’s d = -0.23, 95% CI [-0.43, -0.02]).
With a threshold of 10 the difference is negative, statistically significant, and small (difference = -0.17, 95% CI [-0.24, -0.09], t(678.61) = -4.39, p < .001; Cohen’s d = -0.34, 95% CI [-0.49, -0.19])
Breathing
With a threshold of 1 the difference is negative, statistically significant, and large (difference = -0.44, 95% CI [-0.68, -0.21], t(72.73) = -3.75, p < .001; Cohen’s d = -0.88, 95% CI [-1.36, -0.39])
With a threshold of 3 the difference is negative, statistically significant, and medium (difference = -0.30, 95% CI [-0.44, -0.17], t(204.44) = -4.57, p < .001; Cohen’s d = -0.64, 95% CI [-0.92, -0.36]).
With a threshold of 5 the difference is negative, statistically significant, and small (difference = -0.20, 95% CI [-0.30, -0.11], t(352.13) = -4.21, p < .001; Cohen’s d = -0.45, 95% CI [-0.66, -0.24]).
With a threshold of 10 the difference is negative, statistically significant, and small (difference = -0.21, 95% CI [-0.28, -0.14], t(661.51) = -5.75, p < .001; Cohen’s d = -0.45, 95% CI [-0.60, -0.29])
Overall the low engagement groups reported worse symptoms than high engagement groups. The effects were all large looking at nose symptoms and small when looking at eye symptoms. For breathing symptoms, the effects were large for a threshold of 1, medium for a threshold of 3, and small for larger thresholds.
Year 3
Participants in year three were not asked about their general well being. They were asked about the intensity of their symptoms regarding their nose, eyes, and breathing.
Nose
With a threshold of 1 the difference is negative, statistically significant, and large (difference = -0.57, 95% CI [-0.80, -0.35], t(99.60) = -5.07, p < .001; Cohen’s d = -1.02, 95% CI [-1.43, -0.60]).
With a threshold of 3 the difference is negative, statistically significant, and large (difference = -0.55, 95% CI [-0.70, -0.40], t(203.18) = -7.26, p < .001; Cohen’s d = -1.02, 95% CI [-1.31, -0.72]).
With a threshold of 5 the difference is negative, statistically significant, and large (difference = -0.53, 95% CI [-0.66, -0.41], t(288.60) = -8.42, p < .001; Cohen’s d = -0.99, 95% CI [-1.23, -0.75]).
With a threshold of 10 the difference is negative, statistically significant, and large (difference = -0.64, 95% CI [-0.74, -0.54], t(495.45) = -12.70, p < .001; Cohen’s d = -1.14, 95% CI [-1.33, -0.95]).
Eyes
With a threshold of 1 the difference is negative, statistically significant, and medium (difference = -0.39, 95% CI [-0.61, -0.17], t(99.92) = -3.57, p < .001; Cohen’s d = -0.71, 95% CI [-1.12, -0.31]).
With a threshold of 3 the difference is negative, statistically significant, and medium (difference = -0.34, 95% CI [-0.49, -0.19], t(204.11) = -4.52, p < .001; Cohen’s d = -0.63, 95% CI [-0.91, -0.35]).
With a threshold of 5 the difference is negative, statistically significant, and medium (difference = -0.27, 95% CI [-0.39, -0.15], t(290.93) = -4.31, p < .001; Cohen’s d = -0.51, 95% CI [-0.74, -0.27]).
With a threshold of 10 the difference is negative, statistically significant, and medium (difference = -0.39, 95% CI [-0.49, -0.29], t(500.47) = -7.66, p < .001; Cohen’s d = -0.68, 95% CI [-0.86, -0.50]).
Breathing
With a threshold of 1 the difference is negative, statistically significant, and large (difference = -0.41, 95% CI [-0.61, -0.21], t(100.38) = -4.11, p < .001; Cohen’s d = -0.82, 95% CI [-1.23, -0.41]).
With a threshold of 3 the difference is negative, statistically significant, and large (difference = -0.40, 95% CI [-0.54, -0.26], t(206.08) = -5.77, p < .001; Cohen’s d = -0.80, 95% CI [-1.09, -0.52]).
With a threshold of 5 the difference is negative, statistically significant, and medium (difference = -0.37, 95% CI [-0.50, -0.25], t(292.05) = -6.04, p < .001; Cohen’s d = -0.71, 95% CI [-0.94, -0.47]).
With a threshold of 10 the difference is negative, statistically significant, and large (difference = -0.45, 95% CI [-0.55, -0.36], t(509.74) = -9.50, p < .001; Cohen’s d = -0.84, 95% CI [-1.02, -0.66]).
Overall the low engagement groups reported significantly worse symptoms than high engagement groups. The effects were large looking at nose and breathing symptoms and medium when looking at eye symptoms.
Year 4
Participants in year three were not asked about their general well being This isn’t strictly true, 18 reports include this (out of 1911) . They were asked about the intensity of their symptoms regarding their nose, eyes, and breathing.
Nose
With a threshold of 1 the difference is negative, statistically significant, and large (difference = -0.64, 95% CI [-1.15, -0.14], t(21.21) = -2.65, p = 0.015; Cohen’s d = -1.15, 95% CI [-2.06, -0.22]).
With a threshold of 3 the difference is negative, statistically significant, and large (difference = -0.62, 95% CI [-0.85, -0.40], t(70.22) = -5.54, p < .001; Cohen’s d = -1.32, 95% CI [-1.83, -0.80])
With a threshold of 5 the differenceis negative, statistically significant, and large (difference = -0.78, 95% CI [-0.95, -0.60], t(110.80) = -8.66, p < .001; Cohen’s d = -1.65, 95% CI [-2.07, -1.21])
With a threshold of 10 the difference is negative, statistically significant, and large (difference = -0.64, 95% CI [-0.80, -0.48], t(154.29) = -7.96, p < .001; Cohen’s d = -1.28, 95% CI [-1.63, -0.93])
Eyes
With a threshold of 1 the difference is negative, statistically not significant, and medium (difference = -0.37, 95% CI [-0.80, 0.06], t(21.31) = -1.79, p = 0.088; Cohen’s d = -0.78, 95% CI [-1.65, 0.11]).
With a threshold of 3 the difference is negative, statistically significant, and large (difference = -0.64, 95% CI [-0.88, -0.41], t(70.17) = -5.49, p < .001; Cohen’s d = -1.31, 95% CI [-1.82, -0.79]).
With a threshold of 5 the difference is negative, statistically significant, and large (difference = -0.68, 95% CI [-0.88, -0.48], t(109.40) = -6.67, p < .001; Cohen’s d = -1.28, 95% CI [-1.68, -0.86]).
With a threshold of 10 the difference is negative, statistically significant, and large (difference = -0.47, 95% CI [-0.65, -0.30], t(152.54) = -5.27, p < .001; Cohen’s d = -0.85, 95% CI [-1.18, -0.52])
Breathing
With a threshold of 1 the difference is negative, statistically not significant, and very small (difference = -0.05, 95% CI [-0.35, 0.25], t(21.64) = -0.35, p = 0.730; Cohen’s d = -0.15, 95% CI [-0.99, 0.69]).
With a threshold of 3 the difference is negative, statistically significant, and medium (difference = -0.28, 95% CI [-0.50, -0.05], t(70.42) = -2.48, p = 0.015; Cohen’s d = -0.59, 95% CI [-1.07, -0.11]).
With a threshold of 5 the difference is negative, statistically significant, and medium (difference = -0.26, 95% CI [-0.43, -0.08], t(111.89) = -2.93, p = 0.004; Cohen’s d = -0.55, 95% CI [-0.93, -0.18]).
With a threshold of 10 the difference is negative, statistically significant, and small (difference = -0.19, 95% CI [-0.34, -0.05], t(159.59) = -2.66, p = 0.009; Cohen’s d = -0.42, 95% CI [-0.73, -0.11]).
Overall the low engagement groups reported significantly worse symptoms than high engagement groups with the exception of when a threshold of one was used for breathing and eye symptoms where the results were not significant. For eye and nose symptoms the significant effects were all large, for breathing the effects were medium to small.
Year 5
Same issue as with y1, not clear which variables to use
HMM approach
Games-Howell tests were used to explore differences in symptom intensity between the clusters of engagement operationalised using the hidden Markov model approach.
Year one
ggbetweenstats(
data = y1clust,
x = cluster, ## grouping/independent variable
y = how_im_doing, ## dependent variables
type = "parametric",
pairwise.display = "all")
The mean response for how the participants were feeling was 0.66 for the
tourist cluster, 0.64 for the low engagement cluster, 0.66 for the
medium engagement cluster, and 0.61 for the high engagement cluster.
Only the differences between the high and , and the high and were statistically significant.
Year two
ggbetweenstats(
data = y2clust,
x = cluster, ## grouping/independent variable
y = nose, ## dependent variables
type = "parametric",
pairwise.display = "all")
The mean response for how the participants were feeling was 1.31 for the
tourist cluster, 0.99 for the low engagement cluster, 0.61 for the
medium engagement cluster, and 0.72 for the high engagement cluster. All
differences were significant.
Year three
ggbetweenstats(
data = y3clust,
x = cluster, ## grouping/independent variable
y = nose, ## dependent variables
type = "parametric",
pairwise.display = "all")
The mean response for nose symptoms was 0.66 for the tourist cluster, 0.64 for the low engagement cluster, 0.66 for the medium engagement cluster, and 0.61 for the high engagement cluster. Year four
ggbetweenstats(
data = y4clust,
x = cluster, ## grouping/independent variable
y = nose, ## dependent variables
type = "parametric",
pairwise.display = "all")
In year four participants in the tourist cluster reported the worst
symptoms (mean 1.29) followed by the low engagement (mean 0.97), medium
engagement (mean 0.58), and high engagement (mean 0.03) clusters. All
differences were significant with the exception of the difference
between the tourist and low engageent cluster (Holm-corrected p =
0.45).
Year five
ggbetweenstats(
data = y5clust,
x = cluster, ## grouping/independent variable
y = how_im_doing, ## dependent variables
type = "parametric",
pairwise.display = "all")
#### Age
Pearson’s Chi squared test was used to test for differences in gender between high and low engagement clusters. The strength of the association was measured using Cramer’s V (Cramér, 1946). Cramer’s V is bounded between zero and one, with zero indicating no association, and one indicating a perfect association.
Threshold approaches
| Year | Threshold | % Female (high) | % Female (low) | p | Cramer’s V |
|---|---|---|---|---|---|
| 1 | 1 | 52 | 47 | ~0.00 | 0.040 |
| 1 | 3 | 53 | 47 | ~0.00 | 0.050 |
| 1 | 5 | 53 | 49 | ~0.00 | 0.030 |
| 1 | 10 | 53 | 49 | ~0.00 | 0.030 |
| 2 | 1 | 55 | 58 | ~0.6 | 0.000 |
| 2 | 3 | 55 | 54 | ~ 0.94 | 0.000 |
| 2 | 5 | 55 | 53 | ~0.62 | 0.000 |
| 2 | 10 | 54 | 57 | ~0.28 | 0.005 |
| 3 | 1 | 45 | 64 | ~0.00 | 0.040 |
| 3 | 3 | 44 | 65 | ~0.00 | 0.070 |
| 3 | 5 | 44 | 66 | ~0.00 | 0.090 |
| 3 | 10 | 44 | 59 | ~0.00 | 0.070 |
| 4 | 1 | 23 | 73 | ~0.00 | 0.120 |
| 4 | 3 | 22 | 82 | ~0.00 | 0.260 |
| 4 | 5 | 21 | 77 | ~0.00 | 0.300 |
| 4 | 10 | 20 | 70 | ~0.00 | 0.300 |
| 5 | 1 | 19 | 50 | ~0.00 | 0.070 |
| 5 | 3 | 18 | 61 | ~0.00 | 0.170 |
| 5 | 5 | 18 | 58 | ~0.00 | 0.190 |
| 5 | 10 | 17 | 65 | ~0.00 | 0.280 |
In the first year, the proportion of reports made by women was always higher in the high engagement groups compared to low engagement groups, regardless of the threshold chosen. The associations were very small but significant.
In the second year, the proportion of reports made by women was higher in the high engagement groups only when using thresholds of 3 and 5. The associations were very small and were not statistically significant.
In the third year, the proportion of reports made by women was always higher in the low engagement groups compared to high engagement groups, regardless of the threshold chosen. The associations were small and significant.
In the fourth year, the proportion of reports made by women was always higher in the low engagement groups compared to high engagement groups, regardless of the threshold chosen. The associations were moderate and significant.
In the fifth year, the proportion of reports made by women was always higher in the low engagement groups compared to high engagement groups, regardless of the threshold chosen. The associations were moderate and significant.
HMM approach
Year one
In the first year the proportion of reports made by women were significantly higher in high and medium clusters. There was no significant difference for low and tourist clusters.
Year two
In the second year the proportion of reports made by women were
significantly higher in high and medium clusters. There was no
significant difference for low and tourist clusters.
Year three
In the third year, the proportion of reports made by women was significantly higher in all clusters. In particular the high engagement cluster.
Year four
Year five
Threshold approaches
| Year | Threshold | % Rural (high) | % Rural (low) | p | Cramer’s V |
|---|---|---|---|---|---|
| 1 | 1 | 20 | 19 | 0.65 | 0.00 |
| 1 | 3 | 19 | 20 | 0.70 | 0.00 |
| 1 | 5 | 19 | 20 | 0.30 | 0.00 |
| 1 | 10 | 19 | 21 | ~0.00 | 0.02 |
| 2 | 1 | 24 | 20 | 0.39 | 0.00 |
| 2 | 3 | 24 | 15 | ~0.00 | 0.03 |
| 2 | 5 | 24 | 16 | ~0.00 | 0.04 |
| 2 | 10 | 25 | 18 | ~0.00 | 0.04 |
| 3 | 1 | 29 | 27 | 0.57 | 0.00 |
| 3 | 3 | 30 | 23 | 0.07 | 0.02 |
| 3 | 5 | 30 | 26 | 0.16 | 0.01 |
| 3 | 10 | 30 | 26 | 0.08 | 0.02 |
| 4 | 1 | 36 | 23 | 0.21 | 0.02 |
| 4 | 3 | 36 | 19 | ~0.00 | 0.06 |
| 4 | 5 | 36 | 22 | ~0.00 | 0.06 |
| 4 | 10 | 37 | 16 | ~0.00 | 0.11 |
| 5 | 1 | 26 | 12 | 0.22 | 0.02 |
| 5 | 3 | 26 | 11 | 0.03 | 0.05 |
| 5 | 5 | 26 | 8 | ~0.00 | 0.08 |
| 5 | 10 | 26 | 14 | ~0.01 | 0.06 |
HMM approach
Year one
Year two
Year three
Year four
Year five
#### Has Hayfever
sum(is.na(df$hay_fever))
## [1] 0
table(df$hay_fever)
##
## 0 1 No
## 3873 33612 321
Strangely, this double coding happens within the same versions (v1). Assuming No = 0…
Year one
p1 <- y1clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
ggbarstats(x = hay_fever, y = cluster)
p2 <- y1clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
group_by(user_id) %>%
slice(1) %>%
ungroup() %>%
ggbarstats(x = hay_fever, y = cluster)
p1/p2
Year two
p1 <- y2clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
ggbarstats(x = hay_fever, y = cluster)
p2 <- y2clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
group_by(user_id) %>%
slice(1) %>%
ungroup() %>%
ggbarstats(x = hay_fever, y = cluster)
p1/p2
Year three
p1 <- y3clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
ggbarstats(x = hay_fever, y = cluster)
p2 <- y3clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
group_by(user_id) %>%
slice(1) %>%
ungroup() %>%
ggbarstats(x = hay_fever, y = cluster)
p1/p2
Year four
p1 <- y4clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
ggbarstats(x = hay_fever, y = cluster)
p2 <- y4clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
group_by(user_id) %>%
slice(1) %>%
ungroup() %>%
ggbarstats(x = hay_fever, y = cluster)
p1/p2
Year five
p1 <- y5clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
ggbarstats(x = hay_fever, y = cluster)
p2 <- y5clust %>%
mutate(hay_fever = recode(hay_fever,
"No" = "0")) %>%
group_by(user_id) %>%
slice(1) %>%
ungroup() %>%
ggbarstats(x = hay_fever, y = cluster)
p1/p2
NHS prescription data is available from OpenPrescribing.net. However this data is only available for prescriptions made in England. Reports made outside of England (1697, or ~8.4% of reports) were therefore removed from the data, as they could not be linked to prescription data.
With all data included, the Pearson coefficient for the correlation between symptom intensity and the number of prescribed items was 0.68 for year one, 0.79 for year two, 0.93 for year three, 0.52 for year four, and 0.21 for year five.
Threshold approaches
Year 1
For a threshold of 1, the correlation with low-engagement reports was ~0.51, the correlation with high-engagement reports was ~0.71. The difference is not statistically significant with a p-value of ~0.43.
For a threshold of 3, the correlation with low-engagement reports was ~0.55, the correlation with low-engagement reports was ~0.76. The difference is not statistically significant with a p-value of ~0.36.
For a threshold of 5, the correlation with low-engagement reports was ~0.53, the correlation with high-engagement reports was ~0.76. The difference is not statistically significant with a p-value of ~0.32.
For a threshold of 10, the correlation with low-engagement reports was ~0.56, the correlation with high-engagement reports was ~0.81. The difference is not statistically significant with a p-value of ~0.22.
Year 2
For a threshold of 1, the correlation with high-engagement reports was ~0.92, the correlation with low-engagement reports was ~0.85. The difference is not statistically significant with a p-value of ~0.47.
For a threshold of 3, the correlation with high-engagement reports was ~0.92, the correlation with low-engagement reports was ~0.75. The difference is not statistically significant with a p-value of ~0.16.
For a threshold of 5, the correlation with high-engagement reports was ~0.92, the correlation with low-engagement reports was ~0.74. The difference is not statistically significant with a p-value of ~0.16.
For a threshold of 10, the correlation with high-engagement reports was ~0.91, the correlation with low-engagement reports was ~0.84. The difference is not statistically significant with a p-value of ~0.49.
Year 3
For a threshold of 1, the correlation with high-engagement reports was ~0.92, the correlation with low-engagement reports was ~0.52. The difference is statistically significant with a p-value of ~0.003.
For a threshold of 3, the correlation with high-engagement reports was ~0.93, the correlation with low-engagement reports was ~0.68. The difference is statistically significant with a p-value of ~0.018.
For a threshold of 5, the correlation with high-engagement reports was ~0.93, the correlation with low-engagement reports was ~0.69. The difference is statistically significant with a p-value of ~0.016.
For a threshold of 10, the correlation with high-engagement reports was ~0.92, the correlation with low-engagement reports was ~0.73. The difference is statistically significant with a p-value of ~0.042.
Year 4
For a threshold of 1, the correlation with high-engagement reports was ~0.70, the correlation with high-engagement reports was ~0.18. The difference is not statistically significant with a p-value of ~0.09.
For a threshold of 3, the correlation with high-engagement reports was ~0.45, the correlation with high-engagement reports was ~0.32. The difference is not statistically significant with a p-value of ~0.70.
For a threshold of 5, the correlation with high-engagement reports was ~0.44, the correlation with high-engagement reports was ~0.38. The difference is not statistically significant with a p-value of ~0.85.
For a threshold of 10, the correlation with high-engagement reports was ~0.44, the correlation with high-engagement reports was ~0.49. The difference is not statistically significant with a p-value of ~0.84.
## [1] 0.4353111
## [1] 0.4902635
## [1] 0.8431056
Year 5
For a threshold of 1, the correlation with high-engagement reports was ~0.17, the correlation with high-engagement reports was ~0.36. The difference is not statistically significant with a p-value of ~0.72.
For a threshold of 3, the correlation with high-engagement reports was ~0.23, the correlation with high-engagement reports was ~0.42. The difference is not statistically significant with a p-value of ~0.57.
For a threshold of 5, the correlation with high-engagement reports was ~0.19, the correlation with high-engagement reports was ~0.48. The difference is not statistically significant with a p-value of ~0.48.
For a threshold of 10, the correlation with high-engagement reports was ~0.22, the correlation with high-engagement reports was ~0.39. The difference is not statistically significant with a p-value of ~0.63.
## [1] 0.2249729
## [1] 0.3892258
## [1] 0.6301434
##
## Asymptotic General Independence Test
##
## data: high by low
## Z = 2.5332, p-value = 0.0113
## alternative hypothesis: two.sided
##
## Asymptotic General Independence Test
##
## data: high by low
## Z = 100.05, p-value < 2.2e-16
## alternative hypothesis: two.sided
A permutation test of independence was conducted using the coin package (Hothorn, et al. 2008). The null hypothesis of no difference between low and high engagement groups was rejected at the 0.05 level with a p-value of 0.0113. Or 2.2e-16 when using weights
HMM approach
Year one
tourist <- y1clust %>%
filter(cluster == "tourist") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_t = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_vigo, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_t)
low <- y1clust %>%
filter(cluster == "low") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_l = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_vigo, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_l)
medium <- y1clust %>%
filter(cluster == "medium") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_m = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_vigo, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_m)
high <- y1clust %>%
filter(cluster == "high") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_h = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_vigo, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_h)
combined <- tourist %>%
left_join(low, by = "lubridate::month(date)") %>%
left_join(medium,by = "lubridate::month(date)") %>%
left_join(high,by = "lubridate::month(date)") %>%
dplyr::select(mean_how_t,
mean_how_l,
mean_how_m,
mean_how_h,
items.x)
cor(combined$items.x, combined$mean_how_t, method= "pearson")
## [1] 0.5370265
cor(combined$items.x, combined$mean_how_l, method= "pearson")
## [1] 0.1933603
cor(combined$items.x, combined$mean_how_m, method= "pearson")
## [1] 0.5124759
cor(combined$items.x, combined$mean_how_h, method= "pearson")
## [1] 0.7552904
Year two
presc_y2 <- presc %>%
filter(lubridate::year(date) == 2017) %>%
mutate(month = lubridate::month(date)) %>%
group_by(month) %>%
summarise(sum = sum(items))
tourist <- y2clust %>%
filter(cluster == "tourist") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_t = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y2, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_t)
low <- y2clust %>%
filter(cluster == "low") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_l = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y2, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_l)
medium <- y2clust %>%
filter(cluster == "medium") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_m = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y2, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_m)
high <- y2clust %>%
filter(cluster == "high") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_h = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y2, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_h)
combined <- tourist %>%
left_join(low, by = "lubridate::month(date)") %>%
left_join(medium,by = "lubridate::month(date)") %>%
left_join(high,by = "lubridate::month(date)") %>%
dplyr::select(mean_how_t,
mean_how_l,
mean_how_m,
mean_how_h,
sum.x) %>% drop_na()
cor(combined$sum.x, combined$mean_how_t, method= "pearson")
## [1] 0.758369
cor(combined$sum.x, combined$mean_how_l, method= "pearson")
## [1] 0.6756833
cor(combined$sum.x, combined$mean_how_m, method= "pearson")
## [1] 0.9691427
cor(combined$sum.x, combined$mean_how_h, method= "pearson")
## [1] 0.9055742
Year three
presc_y3 <- presc %>%
filter(lubridate::year(date) == 2018) %>%
mutate(month = lubridate::month(date)) %>%
group_by(month) %>%
summarise(sum = sum(items))
tourist <- y3clust %>%
filter(cluster == "tourist") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_t = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y3, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_t)
low <- y3clust %>%
filter(cluster == "low") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_l = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y3, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_l)
medium <- y3clust %>%
filter(cluster == "medium") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_m = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y3, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_m)
high <- y3clust %>%
filter(cluster == "high") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_h = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y3, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_h)
combined <- tourist %>%
left_join(low, by = "lubridate::month(date)") %>%
left_join(medium,by = "lubridate::month(date)") %>%
left_join(high,by = "lubridate::month(date)") %>%
dplyr::select(mean_how_t,
mean_how_l,
mean_how_m,
mean_how_h,
sum.x) %>% drop_na()
cor(combined$sum.x, combined$mean_how_t, method= "pearson")
## [1] 0.7446511
cor(combined$sum.x, combined$mean_how_l, method= "pearson")
## [1] 0.3066062
cor(combined$sum.x, combined$mean_how_m, method= "pearson")
## [1] 0.9059879
cor(combined$sum.x, combined$mean_how_h, method= "pearson")
## [1] 0.781092
Year four
presc_y4 <- presc %>%
filter(lubridate::year(date) == 2019) %>%
mutate(month = lubridate::month(date)) %>%
group_by(month) %>%
summarise(sum = sum(items))
tourist <- y4clust %>%
filter(cluster == "tourist") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_t = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y4, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_t)
low <- y4clust %>%
filter(cluster == "low") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_l = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y4, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_l)
medium <- y4clust %>%
filter(cluster == "medium") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_m = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y4, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_m)
high <- y4clust %>%
filter(cluster == "high") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_h = mean(nose),
median_how = median(how_im_doing)) %>%
left_join(presc_y4, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_h)
combined <- tourist %>%
left_join(low, by = "lubridate::month(date)") %>%
left_join(medium,by = "lubridate::month(date)") %>%
left_join(high,by = "lubridate::month(date)") %>%
dplyr::select(mean_how_t,
mean_how_l,
mean_how_m,
mean_how_h,
sum.x) %>% drop_na()
cor(combined$sum.x, combined$mean_how_t, method= "pearson")
## [1] 0.1823264
cor(combined$sum.x, combined$mean_how_l, method= "pearson")
## [1] 0.270193
cor(combined$sum.x, combined$mean_how_m, method= "pearson")
## [1] 0.5528867
cor(combined$sum.x, combined$mean_how_h, method= "pearson")
## [1] 0.2024451
Year five
presc_y5 <- presc %>%
filter(lubridate::year(date) == 2020) %>%
mutate(month = lubridate::month(date)) %>%
group_by(month) %>%
summarise(sum = sum(items))
tourist <- y5clust %>%
filter(cluster == "tourist") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_t = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_y5, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_t)
low <- y5clust %>%
filter(cluster == "low") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_l = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_y5, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_l)
medium <- y5clust %>%
filter(cluster == "medium") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_m = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_y5, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_m)
high <- y5clust %>%
filter(cluster == "high") %>%
group_by(lubridate::month(date)) %>%
summarise(reports = n(),
mean_how_h = mean(how_im_doing),
median_how = median(how_im_doing)) %>%
left_join(presc_y5, by = c( "lubridate::month(date)" = "month" )) %>%
drop_na(mean_how_h)
combined <- tourist %>%
left_join(low, by = "lubridate::month(date)") %>%
left_join(medium,by = "lubridate::month(date)") %>%
left_join(high,by = "lubridate::month(date)") %>%
dplyr::select(mean_how_t,
mean_how_l,
mean_how_m,
mean_how_h,
sum.x) %>% drop_na()
cor(combined$sum.x, combined$mean_how_t, method= "pearson")
## [1] NA
cor(combined$sum.x, combined$mean_how_l, method= "pearson")
## [1] NA
cor(combined$sum.x, combined$mean_how_m, method= "pearson")
## [1] NA
cor(combined$sum.x, combined$mean_how_h, method= "pearson")
## [1] NA
This isn’t possible due to data issues
In all versions of the application, users were asked about their nose, eyes, and breathing symptoms. in version v2016 and v2 users were also asked “How are you feeling today?” with the possible answers being “Great” (0) , “So-so” (1), and “Bad” (2).
| Version | How are you feeling today? | Symptoms | n |
|---|---|---|---|
| v2016 | Yes | Yes | 20 098 |
| v1 | No | Yes | 17 084 |
| v2 | Yes | Yes | 624 |
The scores for how_im_doing and individual symptoms can be considered antonyms.
Jaso, et al (2016) define psychometric antonyms as “items that theoretically or logically should have a large difference” (2016: 4).
In the case of the Britain Breathing data, we might logically expect that participants who report feeling “Bad” have high symptoms, and participants who report feeling “Great” have lower symptoms.
Year one
This seems too perfect! I don’t find it plausible that not a single person (out of ~9 000) who said they were feeling great reported any symptoms, even mild ones. Is this a coding artifact? If nothing else, you’d think someone’s finger would slip eventually
** I tried doing the analyis anyways with teh redued scale, but there really doesn’t seem to be any difference **
Year 1
Threshold of 1
Threshold of 3
Threshold of 5
Threshold of 10
Year 2
Threshold of 1
Threshold of 3
Threshold of 5
Threshold of 10
Similar here, but even stranger, not a single person who said they were feeling bad reported any symptoms?
Year one
Do lower engagement users provide lower quality data than high engagement users?
Does the way engagement is operationalised moderate the relationship between data quality and engagement?
How does including/excluding low-engagement users affect the representativness of the sample?
Do certain ways of operationalising engagement provide systematically better trade-offs in terms of data quality, sample size, and representativity?
The impact on # Limitations
Several of the indicators of carelessness identified in the EMA literature were not applicable to the Britain Breathing data set. In particular, a number of commonly identified indicators of carelessness are based on the time taken to complete either individual items, or the whole assessment
Whether a participant contributed on a given day was coded as binary in the paper, however in some rare instances, some participants reported more than once in a given day, potentially indicating a higher level engagement. Further study should explore…
There is some evidence that, when applying hidden Markov models to time series data, estimating the whole distribution of model parameters using Bayesian methods such as Markov Chain Monte Carlo (MCMC) is preferable in terms of both accuracy and staility to estimating a single model using maximum likelihood, as we have done in this paper (Sipos, et al. 2019). Further research should explore whether these findings …
Aceves-Bueno, E., Adeleye, A. S., Feraud, M., Huang, Y., Tao, M., Yang, Y., & Anderson, S. E. (2017). The accuracy of citizen science data: a quantitative review. Bulletin of the Ecological Society of America, 98(4), 278-290.
Agnello, G., Vercammen, A., & Knight, A. T. (2022). Understanding citizen scientists’ willingness to invest in, and advocate for, conservation. Biological Conservation, 265, 109422.
Basiri, A., Haklay, M., Foody, G., & Mooney, P. (2019). Crowdsourced geospatial data quality: Challenges and future directions. International Journal of Geographical Information Science, 33(8), 1588-1593.
Balázs, B., Mooney, P., Nováková, E., Bastin, L., & Arsanjani, J.J (2021). “Data Quality in Citizen Science” in Vohland, K., Land-Zandstra, A., Ceccaroni, L., Lemmens, R., Perelló, J., Ponti, M., … & Wagenknecht, K. The science of citizen science. Springer Nature.
Brovelli, M. A., Minghini, M., Molinari, M., & Mooney, P. (2017). Towards an automated comparison of OpenStreetMap with authoritative road datasets. Transactions in GIS, 21(2), 191-206.
Cramér, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton University Press
Delacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021). Why Hedges’g* s based on the non-pooled standard deviation should be reported with Welch’s t-test.
Druce, K. L., McBeth, J., van der Veer, S. N., Selby, D. A., Vidgen, B., Georgatzis, K., … & Dixon, W. G. (2017). Recruitment and ongoing engagement in a UK smartphone study examining the association between weather and pain: cohort study. JMIR mHealth and uHealth, 5(11), e168.
Elliott, K. C., & Rosenberg, J. (2019). Philosophical foundations for citizen science. Citizen Science: Theory and Practice, 4(1).
Estellés-Arolas, E. (2020). Using crowdsourcing for a safer society: When the crowd rules. European Journal of Criminology, 1477370820916439.
Fisher, R. A. (1921). 014: On the” Probable Error” of a Coefficient of Correlation Deduced from a Small Sample. Metron, 1, 1-32.
Fonte, C.C., Antoniou, V., Bastin, L., Estima, J., Arsanjani, J.J., Bayas, J.C.L., See, L. and Vatseva, R. (2017). Assessing VGI data quality. Mapping and the citizen sensor, 137-163.
Haklay, M. E. (2016). Why is participation inequality important?. Ubiquity Press.
Hothorn T, Hornik K, van de Wiel MA, Zeileis A (2008). “Implementing a class of permutation tests: The coin package.” _Journal of Statistical Software_, *28*(8), 1-23.
Jaso, B. A., Kraus, N. I., & Heller, A. S. (2021). Identification of careless responding in ecological momentary assessment research: From posthoc analyses to real-time data monitoring. Psychological Methods.
Johnson, P. A., & Sieber, R. E. (2013). Situating the adoption of VGI by government. In Crowdsourcing geographic knowledge (pp. 65-81). Springer, Dordrecht.
Kronkvist, K., & Engström, A. (2020). Feasibility of gathering momentary and daily assessments of fear of crime using a smartphone application (STUNDA): Methodological considerations and findings from a study among Swedish university students. Methodological Innovations, 13(3), 2059799120980306.
Lukyanenko, R., Parsons, J., & Wiersma, Y. F. (2016). Emerging problems of data quality in citizen science. Conservation Biology, 30(3), 447-449.
McGonagle, A. K., Huang, J. L., & Walsh, B. M. (2016). Insufficient effort survey responding: An under‐appreciated problem in work and organisational health psychology research. Applied Psychology, 65(2), 287-321.
Minghini, M., Antoniou, V., Fonte, C.C., Estima, J., Olteanu-Raimond, A.M., See, L., Laakso, M., Skopeliti, A., Mooney, P., Jokar Arsanjani, J. and Lupia, F. (2017). “The relevance of protocols for VGI collection.” in
OpenPrescribing.net ( 2022). The DataLab, University of Oxford. Available online: https://openprescribing.net Last accessed: XXXX
Pebesma, E. J. (2018). Simple features for R: standardized support for spatial vector data. R J., 10(1), 439.
Perski, O., Blandford, A., West, R., & Michie, S. (2017). Conceptualising engagement with digital behaviour change interventions: a systematic review using principles from critical interpretive synthesis. Translational behavioural medicine, 7(2), 254-267.
Riesch, H., & Potter, C. (2014). Citizen science as seen by scientists: Methodological, epistemological and ethical dimensions. Public understanding of science, 23(1), 107-120.
Samulowska, M., Chmielewski, S., Raczko, E., Lupa, M., Myszkowska, D., & Zagajewski, B. (2021). Crowdsourcing without Data Bias: Building a Quality Assurance System for Air Pollution Symptom Mapping. ISPRS International Journal of Geo-Information, 10(2), 46.
Sipos, I. R., Ceffer, A., Horváth, G., & Levendovszky, J. (2019). Parallel MCMC sampling of AR-HMMs for prediction based option trading. Algorithmic Finance, 8(1-2), 47-55.
Solymosi, R., Buil-Gil, D., Vozmediano, L., & Guedes, I. S. (2021). Towards a place-based measure of fear of crime: A systematic review of app-based and crowdsourcing approaches. Environment and Behavior, 53(9), 1013-1044.
Sun, J., Rhemtulla, M., & Vazire, S. (2020). Eavesdropping on Missing Data: What Are University Students Doing When They Miss Experience Sampling Reports?. Personality and Social Psychology Bulletin, 0146167220964639.
Ternovski, J., & Orr, L. (2022). A Note on Increases in Inattentive Online Survey-Takers Since 2020. Journal of Quantitative Description: Digital Media, 2.
Trojan, J., Schade, S., Lemmens, R., & Frantál, B. (2019). Citizen science as a new approach in Geography and beyond: Review and reflections. Moravian Geographical Reports, 27(4), 254-264.
Vigo, M., Hassan, L., Vance, W., Jay, C., Brass, A., & Cruickshank, S. (2018). Britain Breathing: using the experience sampling method to collect the seasonal allergy symptoms of a country. Journal of the American Medical Informatics Association, 25(1), 88-92.
Visser,I & Maarten Speekenbrink (2010). depmixS4: An R Package for Hidden Markov Models. Journal of Statistical Software, 36(7), 1-21. URL https://www.jstatsoft.org/v36/i07/.
Wiggins, A., & Wilbanks, J. (2019). The rise of citizen science in health and biomedical research. The American Journal of Bioethics, 19(8), 3-14.
Yardley, L., Spring, B. J., Riper, H., Morrison, L. G., Crane, D. H., Curtis, K., … & Blandford, A. (2016). Understanding and promoting effective engagement with digital behaviour change interventions. American journal of preventive medicine, 51(5), 833-842.
The code for this project is available at https://github.com/NathanKhadaroo/BB_Paper.
The data for this project is not publicly available due to its potentially sensitive nature; however, researchers may request access from XXXX who? XXXX